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Summary. Many complex biological, social, and economical networks show topologies 
drastically differing from random graphs. But, what is a complex network, i.e. how can 
one quantify the complexity of a graph? Here the Offdiagonal Complexity (OdC), a new, 
and computationally cheap, measure of complexity is defined, based on the node-node link 
cross-distribution, whose nondiagonal elements characterize the graph structure beyond 
link distribution, cluster coefficient and average path length. The OdC apporach is applied 
to the Helicobacter pylori protein interaction network and randomly rewired surrogates 
thereof. In addition, OdC is used to characterize the spatial complexity of cell aggregates. 
We investigate the earliest embryo development states of Caenorhabditis elegans. The 
development states of the premorphogenetic phase are represented by symmetric binary- 
valued cell connection matrices with dimension growing from 4 to 385. These matrices can 
be interpreted as adjacency matrix of an undirected graph, or network. The OdC approach 
allows to describe quantitatively the complexity of the cell aggregate geometry. 
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26.1 Complex networks 

From a series of seminal papers (Watts & Strogatz [1] , Barabasi & Albert [2 [3l [4] , 
Dorogovtsev & Mendes [5], Newman [B], see also [7] for an overview) since 1999, 
small- world and scale- free networks have been a hot topic of investigation in a broad 
range of systems and disciplines. 

Metabolic and other biological networks, collaboration networks, www, internet, 
etc., have in common that the distribution of link degrees follows a power law, and 
thus has no inherent scale. Such networks are termed 'scale-free networks'. Com- 
pared to random graphs, which have a Poisson link distribution and thus a char- 
acteristic scale, they share a lot of different properties, especially a high clustering 
coefficient, and a short average path length. However, the question of complexity of 
a graph still is in its infancies. A 'blind' application of other complexity measures (as 
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for binary sequences or computer programs) does not account for the special prop- 
erties shared by graphs and especially scale-free graphs as they appear in biological 
and social networks. 

Mathematically, a graph (or synonymously in this context, a network) is defined 
by a (nonempty) set of nodes, a set of edges (or links), and a map that assigns 
two nodes (the "end nodes" of a link) to each link. In a computer, a graph may be 
represented either by a list of links, represented by the pairs of nodes, or equivalently, 
by its adjacency matrix ay whose entries are 1 (0) if nodes i,j are connected 
(disconnected). Useful generalizations are weighted graphs, where the restriction of 
ciij is relaxed from binary values to (unsually nonnegative) integer or real values (e.g. 
resistor values, travel distances, interaction coupling), and directed graphs, where 
ciij no longer needs to be symmetric, and the link from i to j and the link from j 
to i can exist independently (e.g. links between webpages, or scientific citations). 
In this chapter the discussion will be kept limited to binary undirected graphs. 

26.2 Complexity measures in biology 

In biological sciences, the evolution of life is studied in detail and at large; and it 
is observed qualitatively that evolution creates, on average, organisms of increasing 
complexity. If one wants to quantify an increase of complexity, one has to define 
siutable complexity measures. In some sense, the number of cells may be an indica- 
tor, but quantifies rather body size than complexity. Instead one may observe the 
number of organelles, the size of the metabolic network, the behavioural complexity 
of social organisms, or similar properties. To have a time series of the complexity 
distribution of all organisms during evolution on earth, would be highly interesting 
for the test of models of evolution, speciation and extinctions. But apart from such 
academic questions, there are many areas of practical use of complexity measures in 
biology and medicine, as the complexity of morphological structures, cell aggregates, 
metabolic or genetic networks, or neural connectivities. 

26.3 Other complexity measures 

For text strings (as computer programs, or DNA) there are common complexity 
measures in theoretical computer science, such as Kolmogorov complexity (and the 
related Lempel-Ziv complexity and algorithmic information content AIC) [8]. For 
example, AIC is defined by the length of the shortest program generating the string. 
For random structures, thus also for random graphs, these measures indicate high 
complexity. A distinction of complex structured (but still partly random) structures 
from completely random ones usually is prohibitive for this class of measures. For 
this reason, measures of effective complexity [9] have been discussed; usually these 
are defined as an entropy (or description length) of "a concise description of a set 
of the entity's regularities" [9|. Here we are mainly interested in this second class, 
and straightforwardly one would try to apply existing measures, e.g., to the link 
list or to the adjacency matrix. However, mathematically it is not straightforward 
to apply these text string based measures to graphs, as there is no unique way to 
map a graph onto a text string. 
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Thus one desires to use complexity measures that are defined directly for graphs. 
Two classical measures are known from graph theory; graph thickness and coloring 
number have a low "resolution" and their relevance for real networks is not clear. 
Two new complexity measures recently have been proposed for graphs, Medium 
Articulation [10] for weighted graphs (as they appear in foodwebs) and a measure for 
directed graphs by Meyer-Ortmanns [11] based on the network motif concept [12]). 
Unfortunately, the latter two complexity measures are computationally quite costly. 
A computational complexity approach has been defined by Machta and Machta [13] 
as computational depth of an ensemble of graphs (e.g. small-world, scalefree, lattice). 
It is defined as the number of processing time steps a large parallel computer (with 
an unlimited number of processors) would need to generate a representative member 
of that graph ensemble. Unlike other approaches, it does not assign single complexity 
values to each graph, and again is nontrivial to compute. 

Table 126.11 gives a qualitative assessment of the behaviour of some of the men- 
tioned complexity measures for lattices in 2D and 3D, complex and random struc- 
tures. Note that especially the ability to distinguish nonrandom complex structures 
from pure randomness differs between the approaches. Hence, a simpler estimator 
of graph complexity is desired, and one possible approach, the Offdiagonal Com- 
plexity, is proposed here. A striking observation of the node-node link correlation 
matrices of complex networks |14[|15j is, that entries are more evenly spread among 
the offdiagonals, compared to both regular lattices and random graphs. This can 
now be used to define a complexity measure, for undirected graphs [14] 115). 

This chapter is organized as follows. In Sec. 126.41 OdC is defined and illustrated 
with an example. Sections 126.51 and 126.61 investigate the application of OdC to two 
quite different biological problems: a protein interaction network, compared with 
randomized surrogates, and a temporal sequence of spatial cell adjacency during 
early Caenorhabditis elegans development, quantifying the temporal increase of 
complexity. 



Table 26.1. Qualitative assessment of various complexity measures. 
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26.4 Definition of the Offdiagonal Complexity (OdC) 

Definition (Offdiagonal complexity). Let gij be the adjacency matrix of a graph 
with N nodes, i.e., = 1 if nodes i and j are connected, else = 0. 

(i) For each node i of the graph, let be the node degree, i.e. the number of edges 
(links), 

N-l 

l(t) := 9ij (26.1) 

3=0 

(ii) Let c mn be the number of edges between al pairs of nodes i and j, with node 
degrees m = n = with > l(i) (ordered pairs), i.e., 

N-l N-l 

Cmn ■= 9v 5 m,l(i)5nd(3)H(l(i) - (26.2) 

3=0 j=0 

Here 5 is the Kronecker symbol and H(x) = 1 for x < and H(x) — for x > 0. 
Due to the pair odering, the matrix c mn has entries only on the main diagonal 
and above. Thus, c mn is a (not normalized) node-node link correlation matrix. 

(iii) Summation over the minor diagonals, or offdiagonals, i. e. all pairs with same 
ki — kj up to /c max = miiii{l(i)}, and normalization, gives us 

&max k &max 

ak = Y c i*+i> A:=Y^k, V k a k :=dk/A. (26.3) 

i=0 kO 

(iv) Then OdC is defined as an entropy measure on this normalized distributions 
(here it is understood that 01n(0) = 0), 

fcmax 

OdC = - jr a k \na k . (26.4) 

fc=0 

Examples: For a d-dimensional orthogonal lattice, all nodes have degree 2d, 
and the node-node link correlation matrix has only one nonzero entry at row 2d 
and column 2d. For a fully connected graph, the single entry is at row TV and 
column N. Obviously, for regular graphs (where all node have a fixed degree k) 
OdC=0 holds in general. 

OdC is an approximative complexity estimator that takes as values zero for a regular 
lattice, zero for a fully connected graph, low values for a random graph, and higher 
values for 'apparently complex' structures. One main advantage is that it does not 
involve costly (high-order or NP-complete) computations. 
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26.4.1 Illustration with a spatial network 

A spatial hierarchical network emerging from a self-organizing process has recently 
been introduced by Sakaguchi [16] , as shown in Fig. 126.1a . This snapshot example 
is now taken to illustrate how the node-node link correlation matrix and the OdC 
entropy are modified under a random reshuffling of links. 



(a) (b) 




(a) Self-organized 
structure by Sakaguchi 
k 12345678 
#fc 10 8 6 4 1 1 1 
Link correlation matrix: 
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The vector of diagonal sums is 
(7,11,4,7,0,4,3,5). 

Resulting entropy: OdC = 1.858622 



(b) Same network, links partly 
randomized (1 move/node) 
k 12345678 
#fc 87852100 
Link correlation matrix: 
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The vector of diagonal sums is 
(5,15,16,2,2,1,0,0). 

Resulting entropy: OdC = 1.376939 



The random reshuffling lowers the OdC entropy away from 
OdC max = 2.550838. 



Fig. 26.1. (a) Self- organized structure by Sakaguchi. (b) Randomly rewired network. 
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26.5 Application to the Helicobacter pylori protein interaction 
graph and reshuffling to a random graph 

To demonstrate that OdC can distinguish between random graphs and complex 
networks, the Helicobacter pylori protein interaction graph [T7] has been chosen. 
For different rewiring probabilities p and 10 2 realizations each, the links have been 
reshuffled, ending up with a random graph for p = 1. As can be seen in Fig. 126.21 
rewiring in any case lowers the Offdiagonal Complexity. 




i i , i , i , i , 

5 0.2 0.4 0.6 0.8 1 

Helicobacter - Rewiring probability p - Random graph 



Fig. 26.2. OdC for random reshufHings of the Helicobacter pylori network (left, p = 0) 
up to a rewiring probability of p = 1 (right). The bold line shows the average, five OdC 
trajectories along a rewiring path are shown for illustration (thin lines). 



26.6 Application to spatial cell division networks 

The tiny (1mm) nematode worm Caenorhabditis elegans looks like a quite primitive 
organism, but nevertheless has a nervous system, muscles, thus shares functional 
organs with higher-developed animals. More important, it shows a morphogenetic 
process from a single-cell egg thorugh morphogenesis to an adult worm. Towards 
an understanding of the genetic mechanisms of the cell division cycle in general, 
C. elegans has become one of the genetically best studied animals. Despite that, 
little is known (in the sense of a dynamical model) how the cell divison and spa- 
tial reorganization takes place. Not even the spatial organization of cells during 
morphogenesis is well described. 

26.6.1 Early development of Celegans 

The earliest embryo development states of Caenorhabditis elegans have been 
recorded experimentally and described quantitatively recently [TS]- The cell divi- 
sion development have been described in simplicial spaces, and the cell division 
operations are described by operators in finite linear spaces [19] . 
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26.6.2 Topological structure during premorphogenesis 

The premorphogenetic phase of development runs until the embryo reaches a state 
of about 385 cells. The detailed division times and spatial cell movement trajectories 
follow with high precision a mechanism prescribed in the genetic program. While 
many of the genetic mechanisms are known especially for C.elegans, we are a long 
way towards a mathematical modelling of the cell divison and spatial organization 
directly from the genome. Thus it is still desired to develop mathematical models 
for this spatiotemporal process, and to compare it with quantitative experimental 
data. 

With good reliability the cell adjacency is known experimentally [THl [19] in a 
number of intermediate steps, which in the remainder we called cell states. Here 
we focus on the adjacency matrices of the cells describing each intermediate state 
between cell divisions and cell migrations, and investigate the complexity of neigh- 
borhood relations. 

26.6.3 Increasing complexity of C.elegans states 

The result for 28 state matrices are shown in Fig. 126.31 The dashed line shows the 
supremum value (— In AT) a graph of the same size could reach, despite the fact that 
due to combinatorical reasons this supremum is not necessarily always reached. 

The moderate decay in the last two states may be due to the fact that (at least 
for Poisson-like link distributions) the summation implies some self-averaging if one 
wants to compare networks of different size. One way to avoid this problem is to 
define the complexity measure from all fc m ax ■ (&max — 1) entries, 

FOdC := - J2 £ c y H^j)- (26.5) 

t=0 j=i 

This can be called Full Offdiagonal Complexity, as the full set of matrix entries is 
taken into account. The result for FOdC is shown in Fig. 126.31 




& - 
Is- 



Fig. 26.3. Left: Offdiagonal complexity of the network states. The dashed line shows the 
maximal complexity a graph of same number of nodes could reach. Right: Full Offdiagonal 
complexity. Here all possible pairs of nodes contribute to the complexity. 
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26.6.4 Saturation for large network size 

As expected, the complexity of the spatial cell structure increases along the first pre- 
morphogenetic phase. Compared to the maximal possible complexity that could be 
reached by a graph of same number of node degrees (but not for a three-dimensional 
cell complex) the complexity, as measured by OdC, saturates. This has a straight- 
forward explanation: The limiting case of a large homogeneous cell agglomerate 
would end up with roughly two classes of cells (at surface and within bulk) and 
thus three classes of neighborhood pairs: bulk-bulk, bulk-surface and surface-surface 
(see Fig. 126.4(1 . As the coordination numbers within bulk and surface fluctuate, this 
effectively delimits the growth of possible different neighborhood geometries. After 
initial growth, FOdC resolves fluctuations corresponding to the effect of alternating 
cell division and spatial reorganization. 



Fig. 26.4. Intuitive explanation of saturation for large homogeneous spatial networks. 
From left to right: Bulk-bulk, bulk-surface, and surface-surface are the typical pairs of 
node degrees. For large cell aggregates, surface and bulk cells are more homogeneously, i.e. 
the variation of the neighborhood degree decreases. 



26.7 Conclusions and Outlook 

A new complexity measure for graphs and networks has been proposed. Contrary 
to other approaches, it can be applied to undirected binary graphs. The motivation 
of its definition is twofold: One observation is that the binning of link distributions 
is problematic for small networks. Herefrom the second observation is that if one 
uses instead of the (plain) entropy of link distribution, which is unsignificant for 
scale-free networks, a "biased link entropy" , it has an extremum where the exponent 
of the power law is met. 

The central idea of OdC is to apply an entropy measure to the link correlation 
matrix, after summation over the offdiagonals. This allows for a quantitative, yet 
still approximative, measure of complexity. OdC roughly is 'hierarchy sensitive' and 
has the main advantage of being computationally not costly. 
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